今天的影片內容為爬取一個非常有名的影評網—Rotten Tomatoes(爛番茄)
還會介紹網頁圖片的下載及儲存,非常的實用呦~
以下為影片中有使用到的程式碼
#爛番茄爬取
import requests, bs4
url = "https://editorial.rottentomatoes.com/guide/2021-best-movies/"
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.82 Safari/537.36'}
htmlfile = requests.get(url, headers = headers)
objsoup = bs4.BeautifulSoup(htmlfile.text, 'lxml')
movies = objsoup.find_all('div', class_ = 'row countdown-item')
for movie in movies:
number = movie.find('div', class_ = 'countdown-index')
print("電影編號:", number.text)
name_1 = movie.select('h2 > a') #搜尋所有<h2>內的<a>元素,中間沒有其他元素
name_2 = name_1[0].text
print("Movie:", name_2)
tomatoes = movie.find('span', class_ = 'tMeterScore') #尋找爛番茄指數
print("Grade:", tomatoes.text)
consensus = movie.find('div', class_ = 'info critics-consensus') #尋找評論
print(consensus.text)
synopsis = movie.find('div', class_ = 'info synopsis') #尋找劇情摘要
print(synopsis.text)
starring = movie.find('div', class_ = 'info cast') #尋找主演影星
print(starring.text)
directed_by = movie.find('div', class_ = 'info director') #尋找導演
print(directed_by.text)
print("="*100)
#電影海報下載
#請依個人喜好新增photo資料夾,並將C:\\Users\\ASUS\\Desktop修改為photo資料夾的路徑
import requests, bs4
url = "https://editorial.rottentomatoes.com/guide/2021-best-movies/"
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.82 Safari/537.36'}
htmlfile = requests.get(url, headers = headers)
objsoup = bs4.BeautifulSoup(htmlfile.text, 'lxml')
fn = "C:\\Users\\ASUS\\Desktop\\photo\\"
number = 176 #網站影評數量會緩慢增加,執行前建議先查看原網頁
movies = objsoup.find_all('div', class_ = 'row countdown-item')
for movie in movies:
picture_element = movie.find('img')
picture_url = picture_element.get('src')
print(picture_url)
picture = requests.get(picture_url)
picture.raise_for_status() #查看下載情形
print("圖片下載成功!")
number -= 1
img = fn + str(number) + ".png" #設定檔名
with open(img, 'wb') as file:
file.write(picture.content) #寫入檔案
file.flush() #清空暫存資料
file.close() #關閉檔案
print("圖片" + str(number) + "儲存成功!")
本篇影片及程式碼僅提供研究使用,請勿大量惡意地爬取資料造成對方網頁的負擔呦!
如果在影片中有說得不太清楚或錯誤的地方,歡迎留言告訴我,謝謝您的指教。